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Abstract —  Prognostic  techniques  are  intricately  tied  to  the 
physics  of  incipient-fault-to-failure  progression,  and  hence 
most  prognostics  research  has  focused  on  developing 
techniques  for  a  range  of  components  such  as  rotating 
machinery  parts.  The  research  and  development  of  such 
techniques  has  relied  on  the  theories  of  material  science, 
structural  mechanics,  domain  expertise,  as  well  as  empirical 
studies  such  as  accelerated  run-to-failure  testing.  Even  after 
prognostic  models  have  been  developed  and  operationally 
validated  for  various  components  of  a  system,  the  challenge 
remains  how  prognostic  assessments  from  individual 
components  of  a  system  (such  as  an  aircraft  engine)  should 
be  used  to  make  maintenance  and  logistics  decisions.  In  this 
;  paper,  we  describe  an  integration  process  where  the  primary 
focus  is  on  bridging  the  gap  between  the  individual 
!  component  prognosis  and  the  system-level  reasoning 
required  to  support  maintenance  and  inventory  management 
.’decisions.  The  research  involves  integration  of  component 
health  assessment  with  an  information  fusion  mechanism 
that  operates  in  conjunction  with  a  higher-level  reasoning 
engine  which  utilizes  system-level  structural  and  functional 
dependencies.  The  higher-level  reasoning  engine  generates 
a  system  availability  analysis  that  leads  directly  to  actionable 
tasks  for  the  inventory  and  maintenance  management 
decision  support  systems.  The  inventory  management 
decision  support  system  involves  predicting  the  spares 
requirements,  and  when  this  is  integrated  with  remote  health 
monitoring  and  intelligent  diagnostics  and  prognostics,  it 
can  assess  different  sparing  allocation  schemes,  and 
maximize  system  availability  within  budget  constraints. 
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1.  Introduction 

A.  Maintenance  Management 

Current  maintenance  practices  in  the  world  of  aviation,  in 
particular  the  rotating  machinery  in  Air  Force  on  aircraft 
engines,  such  as  the  turbine  engine  disks,  result  in  the 
replacement  of  99  “good”  or  working  components  just  to 
insure  against  a  single  “bad”  or  cracked  disk  [1]..  Since  the 
DoD  is  increasingly  keeping  older  aircraft  and  flying  them 
past  their  originally  estimated  life,  yearly  expenditures  on 
replacing  the  engine  disks  of  older  aircraft  alone  is  in  excess 
of  $100  million.  Considering  the  false  removals,  the 
unnecessary  expenditure  on  working  engine  disks  constitutes 
$99  million  out  of  the  $100  million.  Similar  statistics  are 
likely  for  other  critical  engine  components,  such  as  the  fan 
compressor  and  the  blades. 

Excessive  false  removals  are  only  part  of  the  story.  The 
current  state-of-the-art  on  detecting  cracks  in  rotating  parts 
of  an  engine  is  primarily  through  visual  inspections  and 
borescope  visualizations.  Cracks  and  crack  propagation  in 
the  fan  compressor,  blades  and  disks,  chipped  or  cracked 
gear  teeth  are  all  of  serious  concern  to  the  aviation  industry 
as  well  as  the  DoD.  The  entire  process  is  predictably  rife 
with  false  alarms  and,  on  a  more  dangerous  note,  could 
potentially  harbor  small  number  of  missed  detections  as 
well  (recall  that  from  a  receiver  operating  characteristic 
(ROC)  analysis,  high  false  alarms  are  accompanied  by 
smaller  missed  detections).  In  those  rare  cases  which  may  go 
unnoticed,  the  result  can  often  be  catastrophic. 

A  well  implemented  component-level  and  system-level 
prognostic  system,  as  shown  in  the  idealized  schematic  in 
Figure  1,  can  alleviate  some  of  the  shortcomings  identified 
above.  Each  intersecting  node  in  the  figure  represents  the 
health  management  capabilities  of  the  specific  monitoring 
functions.  While  many  such  systems  do  analyze,  detect  and 
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assess  the  current  health  state  of  the  components,  of  late, 
only  a  few  of  the  tools  venture  into  the  area  of  damage 
prediction  and  provide  actionable  recommendations  for 
managing  the  component.  Consequently,  systems  that  utilize 
information  related  to  material-level  prognosis  and  that  can 
aggregate  this  information  to  determine  the  functional  state 
of  the  component  facilitate  a  more  effective  condition-based 
maintenance  scheduling  than  time-based  preventive 
maintenance  scheduling  practiced  today.  This  also  results  in 
better  utilization  of  the  component  and  significantly  reduced 
false  removals. 


Monitoring 

Functions 


Figure  1 :  Algorithmic  view  of  a  comprehensive  PHM. 
(Prognostic  Health  Management)  solution.  In  this  case,  the 
monitoring  functions  depicted  are  relevant  to  an  aircraft 
*v.:.  .:  -  engine. 

A  prognostic  system  that  drives,  an  improvement  in  the 
maintenance  management  typically  will  require  integration 
of  three  essential  capabilities  -  physics-based  health  and 
damage  propagation  modeling,  adequate  non-invasive 
mechanisms  to  determine  the  appropriate  quantitative 
knowledge  that  leads  to  health  assessment  and  an 
information  fusion  mechanism  that  operates  in  conjunction 
with  a  higher-level  reasoning  engine. 

B.  Inventory  Management 

Inventory  management  in  the  aviation  industry  is 
challenging  due  to  the  following  difficulties  [2]: 

■  High  Inventory  Value  -  Aviation  equipment  is  highly  - 

specialized  with  expensive  spare  parts;  therefore, 
holding  those  parts  in  inventory  to  ensure  system 
availability  can  result  in  enormous  inventory  costs. 

■  Distributed  Vendors  -  Outsourcing  service  to  third- 

party  aviation  vendors  is  common;  hence 
collaborative  planning,  global  visibility  and  service 
coordination  among  multiple  vendors  are  essential. 

■  Fleet  Availability  Targets  -  In  addition  to  fill  rate  for 

spares,  service  organizations  must  be  able  to  meet 
fleet  availability  targets. 

■  Sporadic  and  Intermittent  Demand  -  Causal  factors 

such  as  operating  hours  and  operating  conditions 


result  in  high  demand  variability.  Without  reliable 
demand  histories,  it  is  difficult  to  calculate  the 
probability  of  failure  events. 

■  Procurement  Lead-Time  Variability  -  Spare  parts  for 
aircraft  systems  often  have  long  lead-times,  making 
spares  optimization  difficult. 

Despite  the  difficulties  in  establishing  an  efficient  inventory 
management  for  such  complex  and  dynamic  systems,  the 
return  on  investment  is  high.  AeroStrategy's  analysis  [3] 
suggests  inventory  worth  $44  billion  is  now  held  in  the  air 
transport  MRO  (Maintenance,  Repair,  and  Overhaul)  supply 
chain,  supporting  nearly  17000  aircraft.  This  implies  that 
about  $2.5  million  worth  of  inventory  is  available  for  each 
aircraft.  It  is  easy  to  see  that  significant  cost  savings  can  be 
achieved  through  efficient  inventory  management  by  a  mere 
5%  improvement. 

It  has  been  shown  that  utilizing  information  about  the 
demand  and  inventory  activities  and  incorporating  them  in 
day-to-day  decision-making  in  supply  chain  management 
will  help  us  achieve  better  material  flow  and  on-time 
deliveries  [4].  However,  due  to  the  need  for  significant 
human  involvement,  the  information  needed  for  demand 
forecasting  is  very  often  buried  in  large  volumes  of 
operational  and  maintenance  data  collected  and  left 
unutilized.  Therefore,  it  will  be  especially  beneficial  if  we 
possess  the  automated  reasoning  capabilities  to  identify 
potential  problem  systems  or  components;  update  repair 
times  and  failure  rates  from  maintenance  log  data  or 
operational  data;  and  use  such  diagnostic  and  prognostic- 
information  for  inventory  management. 

In  this  paper,  we  demonstrate  a  seamless  process,  in  which 
we  utilize  the  material-level  health  assessment  and  damage 
prognosis,  and  the  system  level  structural  and  functional 
dependencies  to  generate  a  subsystem  or  component 
prognostic  analysis  that  leads  to  directly  actionable  decision¬ 
making  tasks  for  spares  allocation  and  inventory 
management.  A  system,  as  described  above,  when 
effectively  implemented,  can  enhance  the  ability  of  decision¬ 
makers  to  efficiently  deploy  and  manage  their  assets  in 
rapidly  evolving  combat  situations  where  demands  on  the 
component  can  be  intense  and  stressful. 

C.  Scope  and  Organization  of  the  Paper 

The  paper  is  organized  as  follows.  Section  2  outlines  a 
smart-service  concept  in  which  diagnostic  and  prognostic 
reasoners  work  hand-in-glove  with  the  inventory 
management  module  for  planning  maintenance  actions  and 
replenishing  supplies  preemptively.  Section  3  describes  a 
model-based  prognostic  process  and  reports  on  a 
demonstration  of  the  prognostic  procedure  on  a  generic 
centrifugal  pump  system.  In  Section  4,  we  describe  a 
simulation-based  approach  to  system  availability  analysis, 
and  demonstrate  an  integrated  process  in  which  the  updated 
residual  life  predictions  obtained  from  prognostics  are  used 
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to  forecast  service  demand  and  to  assess  the  efficacy  of 
different  spares  allocation  schemes  in  maximizing  system 
availability  using  an  engine  system  as  an  example.  Section  5 
concludes  the  paper  with  summary  and  future  research 
directions. 


2.  Concept  for  a  “Smart  Services”  Solution 


Figure  2  depicts  our  concept  of  how  an  inventory 
management  module  can  be  deployed  and  integrated  with 
diagnostics  and  prognostics  to  provide  smart  services,  i.e., 
planning  maintenance  actions  and  replenishing  supplies 
preemptively,  rather  than  retroactively.  Unlike  conventional 
maintenance  strategies,  prognostic  techniques  predict 
component  degradation  based  on  observed  system  condition 
to  support  “just-in-time”  maintenance.  The  ever  increasing 
usage  of  model-based  diagnosis  and  prognosis  of  systems 
facilitates  the  integration  of  model-based  diagnosis  and 
prognosis  of  systems,  leading  to  condition-based  spares 
management  and  maintenance.  In  the  proposed  architecture, 
the  demand  forecasting  and  inventory  management  module 
incorporates  updates  on  system  operating  conditions  and 
health  information  collected  and  inferred  by  a  diagnosis  and 
prognosis  server.  The  functions  of  the  blocks  developed  in 
this  paper  are  discussed  in  greater  detail  below,  while  the 
spares  allocation  optimization  module  will'  be  included  in 
our  future  research.  . 


Nominal 
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Models 
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Figure  2:  Concept  for  a  “Smart  Services”  Solution 
Supported  by  Diagnostics  and  Prognostics 


A.  Demand  Forecasting 

Demand  for  spares  cannot  be  easily  gauged  because  the 
consumption  of  spares  is  event-based  and  therefore  is 
probabilistic  in  nature.  The  events  themselves  can  be 
scheduled  (planned)  or  unscheduled  (unplanned).  Scheduled 
maintenance,  system  overhauls,  etc.,  would  fall  in  the  first 
category,  whereas  random  breakdowns  would  be  in  the 
second  category.  The  intermittent  nature  of  spare  part 


demand  due  to  breakdowns  makes  it  impossible  to  apply 
conventional  time  series  based  algorithms.  Moreover,  since 
the  real  operating  conditions  of  a  component  usually  differ 
from  assumed  ones,  discrepancies  arise  between  specified 
constant  mean-time- to -failure  and  true  residual  life  time. 
Predictions  of  residual  life  time  and  their  variances  based  on 
observed  environment  (e.g.,  temperature,  humidity  and  dust) 
and  operating  conditions  (e.g.,  vibration  and  pressure)  or 
historical/field  data  have  inherently  much  higher  quality  than 
constant  mean-time-to-failure  data.  Model-based  prognostic 
approaches  are  applied  to  obtain  component  degradation 
profiles,  estimate  the  consequences  of  such  degradations  on 
system  performance  variables,  and  dynamically  evolve  the 
residual  life  time  prediction  based  on  the  load  and 
environmental  conditions. 

The  main  idea  is  to  move  away  from  traditional  demand- 
driven  forecasting  that  relies  on  a  time-based  scheduled 
maintenance  approach  to  condition-based  forecasting  using 
remaining  life  predictions  of  components.  We  describe.  ^ 
model-based  prognostic  approach  in  section  3  for  computing 
the  remaining  useful  life  of  a  component. 

B.  System-level  Availability  Estimation 

After  obtaining  the  remaining  useful  life  estimation  using 
model-based  prognostic  approaches,  we  perform  system- 
level  availability  analysis  to  evaluate  the  performance  .of 
each  candidate  spares  allocation  scheme.  The  system 
availability  analysis  methodology  is  hierarchical  and 
comprises  of  .models  at  two  tiers:  At  the  lower  tier  is  a 
component-level  model  that  considers  the  impact  of  spares 
on  component  availability.  At  the  higher  tier  is  a  system- 
level  model  that  considers  the  impact  of  system  architecture, 
mission  phases,  and  different  system  configurations  over 
mission  phases  on  system  availability.  The  component 
availabilities  computed  from  the  lower  tier  models  are 
propagated  to  the  higher  tier  model  to  enable  the 
computation  of  system  availability.  The  details  of  system 
availability  analysis  approach  will  be  described  in  Section  4. 


3.  Model-based  Prognostic  Techniques 

A.  Approach 

The  model-based  prognostic  approaches  are  applicable  in 
situations  where  .accurate  mathematical  models  can  be 
constructed  from  first  principles.  These  methods  use 
residuals  as  features,  where  the  residuals  are  the  outcomes  of 
consistency  checks  between  the  sensed  measurements  of  a 
system  and  the  outputs  of  a  mathematical  model.  The 
premise  is  that  the  residuals  are  large  in  the  presence  of 
malfunctions,  and  small  in  the  presence  of  normal 
disturbances,  noise  and  modeling  errors.  Statistical 
techniques  are  used  to  define  the  thresholds  to  detect  the 
presence  of  faults.  The  three  main  ways  of  generating  the 
residuals  are  based  on  parameter  estimation,  observers  (e.g., 
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Kalman  filters,  reduced  order  unknown  input  observers, 
Interacting  Multiple  Models  [7])  and  parity  relations. 

Figure  3  illustrates  our  adaptive  model-based  prognostic 
process  [6].  In  this  process,  first,  a  system  degradation 
model  is  identified  and  data  from  model-based  simulations 
under  nominal  and  degraded  conditions  are  collected. 
Oftentimes,  simulation  tools  have  already  been  developed, 
such  as  the  TRICK  simulator  for  the  orbital  maneuvering 
system/reaction  control  system  of  NASA  spacecraft  and  the 
STORM  model  for  P&W  engines;  then  we  can  use  these 
tools  directly  to  obtain  operational  data  under  different 
usage  profiles. 


Figure  3;  A  Model-based  Prognostic  Process 


Because  of  the  continuously  changing  nature  of  an  abnormal 
condition,  the  severity  of  a  fault  increases  with  the  usage  of 
the  system.  This  change  in  fault  severity  over  time  forms  a 
trajectory  of  degradation,  which  is  dependent  on  the  usage 
profiles  (environmental  and  operating  conditions). 
Therefore,  in  the  second  step  of  the  process,  prognostic 
models  based  on  different  random  usage  profiles  and 
conditions  (termed  modes)  are  constructed.  Third,  the 
Interacting  Multiple  Model  (IMM)  approach  is  used  to  track 
the  hidden  damage  to  make  the  prognosis  adaptive  to  the 
current  usage  profile,  while  remaining  useful  life  prediction 
is  performed  by  mixing  mode-based  life  predictions  via 
time-average  mode  probabilities.  A  by-product  of  this 
process  is  the  prediction  of  Time  to  Criticality ,  defined  as 
the  time  from  the  indication  of  a  fault  or  degradation  of  a 
function  to  the  complete  failure  of  that  function.  Comparing 
this  measure  with  the  Time  to  Remediate  will  offer  us 
insights  into  whether  a  catastrophic  system  failure  can  be 
confidently  prevented.  The  solution  proposed  here  is  generic 
and  has  the  potential  to  be  applicable  to  a  variety  of  aviation 
systems  and  their  components.  The  main  advantage  of  such 
a  model-based  prognostic  process  over  data-driven 
approaches  is  its  ability  to  incorporate  a  physical 
understanding  of  the  system  for  monitoring.  Another 
advantage  is  that,  in  many  situations,  the  changes  in  feature 
vector  are  closely  related  to  model  parameters.  Therefore,  it 
can  also  establish  a  functional  mapping  between  the  drifting 


parameters  and  the  selected  prognostic  features.  Moreover, 
as  understanding  of  the  system  degradation  improves,  the 
model  can  be  adapted  to  increase  its  accuracy  and  to  address 
subtle  performance  problems.  Thus,  our  approach  may  be 
viewed  as  a  Bayesian  approach  to  prognosis,  as  opposed  to  a 
Maximum  likelihood  approach  based  purely  on  data. 

As  illustrated  in  Figure  3,  our  adaptive  model-based 
prognostic  process  consists  of  six  steps  for  predicting  the 
residual  life  time  of  a  comp onent/sys tern.  The  details  of 
each  step  can  be  found  in  [6].  In  Section  B.2,  we 
demonstrate  the  six-step  prognostic  process  using  an 
example  scenario. 

B .  Application  to  an  Example  System 

In  this  section,  we  study  the  system  model  of  a  generic 
centrifugal  pump  system  and  predict  the  residual  lifetime  of 
the  system  using  the  process  elaborated  above. 

B.l  Basic  Principles  of  a  Generic  Centrifugal  Pump 

The  purpose  of  a  centrifugal  pump  is  to  convert  the  energy 
of  an  electric  motor  or  engine  into  kinetic  energy,  and  then 
into  pressure  of  a  fluid  that  is  being  pumped.  Figure  4 
illustrates  the  operation  of  a  centrifugal  pump. 


Figure  4:  Operation  of  a  Centrifugal  Pump 

The  simplified  physical  model  of  a  centrifugal  pump  is 
derived  using  conservation  of  power  and  momentum.  The 
corresponding  equation  is 

r-0  =  Pourfau,  (D 

where  T  is  the  input  torque,  9  represents  angular  velocity  of 
the  pump  rotor,  pout  is  the  pump  pressure,  and  </)out  is  the 
corresponding  mass  flow  rate. 

Conservation  of  momentum  states  that  the  mechanical 
momentum  ^T.dt  equals  the  hydraulic  momentum.  The 

coefficient  of  transmission  of  the  gyrator  model  includes  two 
parameters,  a  and  b ,  that  represent  the  cross  sectional  area 
and  curvature  of  the  veins.  The  amount  of  mass  moved  by 
the  pump  depends  on  the  total  area  of  its  vanes,  a  minus  the 
effective  loss  in  moved  mass  due  to  the  curvature  of  the 

vanes,  b .  This  is  given  as  J ( a.9  —  b.(/>out  ).dt . 

The  hydraulic  momentum  of  the  pump  is  represented  by 

tout  -b.tout)-dt ■  Therefore 
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\t.dt  =  </>out  J (a£  -  b.<t>out  ).dt  (2) 

Eq.  (2)  can  be  rewritten  as: 

*  = flout  (a-6  ~  b-tJ  ^  flout  fa#  ~  b- flout  )'dt  (3) 

which  for  relatively  low  flow  accelerations  compared  to 
flow  velocity,  yields  the  constituent  relation 

r  -flout  (a-&  ~  b -flout )  • This  yields 

PouMa-0-b-AoU,)0  (4) 

Eq.  (4)  describes  a  modulated  gyrator  with  modulus 

(a-0-b.floul). 

B.2  Component-level  Degradation  Model 

The  analysis  done  here  is  only  to  explore  the  feasibility  of 
applying  the  methodology  described  in  Section  3. A  for 
residual  life  prediction  of  the  pump  system.  Therefore,  a 
simplified  physical  model  of  the  pump  is  used  instead  of 
complex  finite  element  models.  The  flow  in  the  system  is 
assumed  to  be  steady  and  the  computational  fluid  dynamics 
model  is  not  used.  Future  application  may  require  advanced 
numerical  simulation  instead  of  the  current  simplified  model. 
The  goal  for  this  part  of  the  inethodology  is  to  link  the 
possible  causes  of  the  fault,  established  at  the  system  level 
(e.g.,  change  in  transmission  efficiency  from  motor  to  pump, 
possible  change  in  surface  area  of  pump  vanes,  and  increase 
in  resistance  parameter)  to  root  causes  that  are  defined  by 
physics  of  failure  models.  An  example  failure  scenario  is 
illustrated  below.  Identification  of  a  .root  cause  enables  one 
to  link  mathematically  the  effect  of  damage  at  the  material  . 
and  structural  level  to  rate  of  change  of  system  level 
parameters.  Studying  these  rates  and  simulating  system 
behavior  and  performance  using  these  rates,  provide  a 
framework  for  predicting  the  residual  lifetime  of  the  system. 

An  Example  Scenario:  Corrosion/erosion  damage  to  vanes 

Corrosion/erosion  damage  removes  material  from 
components,  such  as  vanes  in  the  current  example.  A 
schematic  picture  of  this  type  of  damage  is  shown  in  Figure 
5.  The  corrosion/erosion  damage  may  change  the  vane 
surface  and  reduce  the  total  area  of  vanes  in  moving  flow. 
Also,  the  irregular,  surface  of  vanes  may  cause  turbulence  in 
the  pump  and  reduce  the  efficiency  in  moving  flow.  All 
these  effects  can  be  considered  as  loss  of  vane  area  (a). 


Figure  5:  Schematic  corrosion/erosion  damage  to  the  vanes 
(left)  and  schematic  pitting  growth  model  (right) 


To  calculate  the  area  loss  due  to  corrosion,  detailed  local 
flow  analysis  and  dissolution  mechanism  are  required.  A 
simple  calculation  is  shown  here  for  illustrative  purpose 
under  simplified  assumptions. 

Step  1:  Identify  system  model 

The  corrosion  rate  q  of  vane  material  is  taken  from  [13]  as 
q  =  K(cs-cb )  (5) 

where  K  is  the  mass  transfer  coefficient  dependent  on  the 
flow  velocity,  cs  is  the  corrosion  product  concentration  at 
the  liquid-solid  interface  dependent  on  the  local  temperature, 
and  Cb  is  the  concentration  in  the  bulk  flow  and  is  often  set 
to  zero  [13].  We  assume  constant  flow  velocity  and 
temperature,  and  same  concentration  in  the  liquid.  Thus,  q , 

K  and  cs  are  constants. 

Further,  we  assume  the  corrosion  damage  to  the  vane  occurs 
at  the  edge.  The  vane  area  loss  is  due  to  edge  pitting  growth. 
The  growth  pattern  is  assumed  to  be  circular,  as  shown  in 
Figure  5  (right). 

The  area  loss  at  one  pitting  location  can  be  expressed  as 
A  ai  =  qxrAt  =  qTUqtAt  (6) 

Integrating  Eq.  (6)  over  time,  we  obtain  the  area  loss  at  one 
pitting  location  as  a  function  of  time  t ,  as  plotted  in  Figure  6. 
The  total  vane  area  loss  A  a  can  be  expressed  as 

Aa  =  ^  Aa.  =  —n£2t2 

,=1  2  (7) 

<r2=(cf-ct)2IX 

i=i 


Time 


Figure  6:  Schematic  Plot  of  Area  Loss  Function 

At  a  time  instant  tI}  the  pump  pressure  and  flow  rate  can  be 
expressed  as 

P  out  =  (CL  ,0-bu</)  out^O  (8) 

where  the  superscript  *  indicates  the  quantity  is  at  the  time 
instant  t}.  Based  on  the  first  order  perturbation  theory,  these 
quantities  can  be  expressed  as 
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p\u'=  poul  +  Ap 

‘fom  -  <j>0M  -  A</>  (9) 

a  -a- Aa 


Substituting  Eq.  (9)  into  Eq.  (8),  we  obtain 

poul  +  Ap  =ad2  -Aa9-b0oul9  +  bA0d  (10) 


Substituting  Eq.  (4)  into  Eq.  (10)  and  solving  for  A  a ,  we 
obtain 


A  a  ~ 


bA(f)0  -  A p 

J1 


(11) 


where  A  a ,  A  (/)  and  A p  are  all  functions  of  t  during  the 
entire  service  life  of  the  pump  system. 


Eq.  (3)  consists  of  fast-time  dynamic  equation  and 
measurement  equation.  To  construct  a  complete  prognostic 
model,  we  need  a  slow-time  model  for  the  degradation 
measure  £  In  this  model,  we  assume  that  the  area  loss  of  the 
vanes  during  one  simulation  episode  i,  A0C{ ,  is  a  function  of 
£  the  area  loss  at  the  beginning  of  the  episode  (ai ),  and  the 
load  during  this  episode,  in  the  form  of 

Aa,  =  0.1  Cm,.)  ■fj{pfl  •[1.122-1.4(a(  /a)2(12) 

;+ 7.33(0, /a) -13.08(0, /a)3  +  14(o,  /a)*]2 

where  rti  is  the  cycle  number  and  {pf}nj=1  is  the  load 

parameter,  both  of  which  are  obtained  via  some  cycle 
counting  approaches,  such  as  rainflow,  mean-crossing,  etc. 
[10]  based  on  the  stress/strain/load  information  during  the 
episode,  which  is  assumed  to  be  a  function  of  (f>out .  In  this 

paper,  we  adopt  the  most  commonly  used  cycle  counting 
method,  viz.,  the  rainflow  method.  This  method  is  able  to 
catch  both  slow  and  rapid  variations  of  load  by  forming 
cycles  that  pair  high  maxima  with  low  minima,  even  if  they 
are  separated  by  intermediate  extremes  [14].  We  assume  that 
the  maximum  Ct  is  a  /  4  and  define  the  degradation 
measure  as  E,  —  Act  /  a  .  Apparently,  £  =  1  marks  the  end 
of  life  for  the  pump. 

Step  2:  Simulation  results 

The  system  model  in  Eq.  (3)  was  simulated  with  a  standard 
4th-order  variable-step-size  Runge-Kutta  method.  Figure  7 
shows  the  results  of  100  Monte-Carlo  simulations  for  the 
pump  system  under  three  different  load  conditions,  with  the 
input  torques  equal  to  2000,  2500,  and  3000  Nm, 
respectively,  and  the  angular  velocity  is  set  at  165  rad/s  for 
all  three  modes.  The  nominal  cross  sectional  area  of  the 
pump,  a  -  0.3 m2 .  The  perturbations  are  added  to  the  inputs, 
i.e.,  the  torque  and  the  angular  velocity,  in  the  form  of  a 
Gaussian  noise  with  zero  mean  and  variances  that  are  set  to 
make  the  signal-to-noise  ratio  (SNR)  equal  to  1  in  all  three 
modes.  From  Figure  7,  we  can  see  that  compared  to  the 


severely  overloaded  condition,  the  increases  in  the  life  times 
for  the  overloaded  and  normal  conditions  are  about  40%  and 
140%,  respectively.  If  we  assume  a  50%  calendar  time  usage 
of  the  pump  system  (12  hours  a  day),  the  expected  life  of  the 
pump  system  will  be  approximately  3.8,  5.4,  and  9.2  years, 
respectively,  for  the  three  load  conditions  (severely 
overloaded,  overloaded  and  normal). 


Step  3:  Prognostic  modeling  from  simulation  data 

Since  the  pump  system  has  three  random  load  conditions, 
the  number  of  modes  in  the  degradation  model  is  3.  The 
model  parameters  for  each  mode  can  easily  be  estimated 
from  the  Monte-Carlo  simulations  as  described  in  [6]. 


Figure  7:  100  Monte-Carlo  Simulations  for  3  Random  Loads 


Step  4:  Feature  estimation 


The  hidden  variable  £  is  estimated,  from  the  input/output 
data,  i.e.,  input  torque,  input  angular  velocity  and  output 
flow  rate,  through  linear  least-squares  estimation  by 
minimizing  the  sum  of  square  error  between  the  estimated 
output  and  the  actual  measurement  over  the  observation 
period. 


Step  5:  Track  the  degradation  measure 


For  IMM  implementation,  we  use  the  following  transition 
matrix: 


3>  = 


0.9  0.05  0.05 

0.05  0.9  0.05 

0.05  0.05  0.9 


where  (j)V}  =P(mode  j  in  effect  at  time  £+llmode  i  in  effect  at 

time  k ).  The  system  mode  changes  are  simulated  as  follows. 
Mode  1:  [0,  40x10V],  Mode  2:  [40,  80xl06s],  Mode  3:  [80, 
,  where  tmd  is  the  time  at  which  (f  =  1 . 


Figure  8  shows  the  plot  of  mode  probabilities  of  the  IMM. 
The  mode  probabilities  for  the  three  modes  are  initialized  to 
(ft. (0)  =  1/3,  j  =  1,2,3)  and  then  Mode  1  reaches  the  highest 

mode  probability  (approximately  0.80-0.90)  in  the  range  [0, 
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40x1 06^].  Mode  2  reaches  the  highest  mode  probability 
(approximately  0.90-1.0)  in  the  range  [40,  80x1 06,y].  Finally, 
Mode  3  dominates  the  remainder  of  simulation  with  the 
highest  probability  of  around  1.  Thus,  the  IMM  (which  may 
be  viewed  as  a  software  sensor)  tracks  the  load  conditions 
very  well  based  on  noisy  data. 


that  are  very  close  to  the  estimates  that  one  would  get  with 
the  additional  sensor.  The  difference  between  these  two 
estimates  is  relatively  high  (about  10%)  at  the  beginning  (£ 
<  0.2),  and  they  are  virtually  identical  as  degradation 
measure  £  increases. 

4.  Spares  Allocation  Assessment 
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Figure  8:  Mode  Probabilities  of  the  IMM 


Step  6:  Predict  the  Remaining  Life 

Figure  9  presents  the  estimate  of  residual  life  (solid  line) 
using  IMM  mode  probabilities  for  a  single  run  of  the 
scenario  considered  in  Step  5.  We  can  see  that,  initially,  the 
residual  life  estimate  follows  Mode  1  and  after  switching  to 
Mode  2,  the  residual  life  estimate  is  in  between  those  of 
Modes  1  and  2,  which  is  what  one  would  expect.  Finally,  the 
residual  life  estimate  approaches  that  corresponding  to 
Mode  2.  The  dashed  bold  line  represents  the  residual  life 
estimates  assuming  that  the  load  condition  can  be  measured 
accurately  via  a  sensor.  In  such  a  case,  the  current  mode  is 
known.  We  can  evaluate  the  contribution  of  the  additional 
sensor  to  the  accuracy  of  the  residual  life  estimate.  In  Figure 
9,  we  can  see  that  the  IMM  produces  residual  life  estimates 


A .  System  Availability  Analysis  Approach 

The  system  availability  analysis  methodology  comprises  of 
both  component-level  and  system-level  analyses. 

A.  1  Component-level  Model: 

We  describe  the  lower  tier  component-level  availability 
model  in  this  section.  Assume  that  the  time  to  failure  of  a 
component  is  exponentially  distributed.  When  a  component 
fails,  it  is  replaced  immediately  with  a  spare  if  a  spare  is 
available.  However,  if  a  spare  is  not  available,  then 
additional  spares  need  to  be  procured  for  replacement.  The 
repair  times,  both  with  and  without  the  availability  of  spares 
are  assumed  to  be  exponentially  distributed.  For  component 
i,  let: 

Si  -  Number  of  spares. 

Xi  -  Failure  rate. 

|i{  -  Repair  (replacement)  rate  when  a  spare  is  available. 

Yi  -  Repair  (replacement)  rate  when  a  spare  is  not  available. 

pk  -  Probability  that  k  spares  are  obtained  to  rebuild 
inventory. 

Figure  10  shows  the  Markov  model  of  the  failure  and  repair 
process  of  a  component  in  the  presence  and  absence  of 
spares.  In  the  model,  the  state  is  described  by  a  2-tuple  ( c,d ), 
where  c  is  the  number  of  spares  available  in  the  inventory 
and  d  indicates  whether  the  component  is  operational  or 
failed.  Thus,  c  ranges  from  st  to  0  and  d  can  take  two  values, 
namely,  U  (Up)  and  D  (Down).  The  component  starts  in 
state  (shU)  where  all  the  st  spares  are  available  and  the 
component  is  operational.  From  this  state,  the  component 
can  transition  to  state  (shD)  with  rate  X\  upon  the  failure. 
From  the  state  ( shD ),  a  transition  occurs  to  state  (srl,U) 
with  rate  jij,  when  the  failed  component  is  replaced  by  an 
available  spare.  This  continues  until  state  (0,U)  is  reached  in 
which  the  component  is  operational,  but  no  additional  spares 
are  available.  A  failure  in  this  state  causes  a  transition  to 
state  (0,D)  with  rate  X-{.  In  the  state  (0,Z)),  since  no  spares  are 
available,  additional  spares  need  to  be  procured.  It  is 
certainly  necessary  to  obtain  one  component  to  replace  the 
failed  one.  In  addition,  extra  spares  may  be  obtained  to 
rebuild  the  inventory  of  spares.  With  probability  pk, 
additional  spares  are  obtained,  where  k  ranges  from  0  to  s(. 
As  a  result,  from  the  state  (0,D),  transition  to  state  (k,U)  can 
occur  with  rate  ypk. 
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Figure  10:  Component-level  Availability  Model 
A.  2  System-level  Model : 

The  system  availability  model  at  the  higher  tier  of  the 
hierarchy  is  described  in  this  section.  In  the  base  scenario, 
all  the  components  of  the  system  are  active  when  the  system 
is  operational.  In  this  case,  the  system  may  be  considered  to 
have  just  one  phase  and  the  components  of  the  system  may 
be  organized  into  a  series,  parallel,  k-of-n  or  a  combination 
of  these  structures.  A  reliability  block  diagram  may  be  an 
adequate  system  availability  model  in  this  case. 

For  some  systems,  execution  may  proceed  through  phases, 
and  not  all  components  are  active/operational  in  all  phases. 
For  such  a  system,  the  phased  execution  may  be  modeled  as 
a  continuous  time  Markov  chain,  with  the  state  space  given 
by  the  phase  of  system  execution.  Let  m  denote  the  number 
of  phases,  with  the  sojourn  time  in  each  phase  exponentially 
distributed.  We  let  denote  the  parameter  of  the 
exponential  distribution  of  sojourn  time  in  phase  /.  Further, 
we  let  denote  the  probability  that  the  system  transitions 
from  phase  /  to  phase  r,  with/ and  r ranging  from  1  through 
m. 

A.  3  Computation  of  System  Availability 

Most  of  the  efforts  in  evaluating  highly  dependable  systems 
are  limited  to  analytical  or  numerical  solutions,  usually 
restricted  to  Markov  models.  The  applicability  of  these 
techniques,  however,  is  hindered  by  practical  problems,  such 
as  state-space  explosion  of  Markov  representations  of  real 
systems.  Because  the  number  of  states  in  Markov  models 
usually  grows  exponentially  with  the  number  of  system 
components,  and  because  of  storage  and  computational 
limitations,  only  relatively  small  systems  can  be  analyzed 
using  numerical  solution  techniques. 

When  conventional  analytical/numerical  methods  are  no 
longer  feasible,  analysts  often  turn  to  computer  simulation, 
with  the  obvious  advantages  of  flexible  representation  of 
complex  systems  at  the  desired  level  of  abstraction  and  low 
storage  requirements.  However,  the  accurate  estimation  of 
availability  using  simulation  requires  frequent  observations 
of  the  system  failure  event,  which  by  definition  are  rare 
events  in  highly-dependable  systems.  This  renders 
conventional  simulation  impractical  for  evaluating  such 
systems.  To  solve  this  problem,  there  have  been 
considerable  and  successful  efforts  to  develop  fast 
simulation  techniques  based  on  Importance  Sampling.  The 
basic  idea  of  Importance  Sampling  is  quite  simple:  simulate 


the  system  using  new  probability-dynamics  (different  from 
the  original  probability-dynamics  of  the  system),  so  as  to 
increase  the  probability  of  typical  sequences  of  events 
leading  to  system  failures.  The  obtained  availability  measure 
in  a  given  observation  is  then  multiplied  by  a  correction 
factor  called  the  “likelihood  ratio”  to  yield  an  ^-unbiased 
estimate  of  the  measure.  Appropriate  and  careful  choice  of 
the  new  underlying  probability  dynamics  of  the  simulated 
system  can  yield  an  appreciable  reduction  in  the  variance  of 
the  resulting  estimate,  which  implies  appreciable  reduction 
in  the  simulation  time  needed  to  achieve  a  specified 
precision  [5]. 

The  methodology  of  applying  the  Importance  Sampling 
approach  to  estimate  the  system. availability  with  spares 
information  is  outlined  in  [17], 

B.  A  Case  Study 

In  this  section,  we  apply  the  techniques  described  above  to 
evaluate  and  compare  different  spares  allocation  schemes.  In 
this  study,  we  obtained  a  Detailed  Maintenance  Data  report 
generated  for  Type  Equipment  Code  (TEC)  code  AMAF 
and  Work  Unit  Code  (WUC)  “27%”  (F/A-18  Engine 
System)  during  10/01/05  -  10/10/06  from  Navy’s 

DECKPLATE  (Decision  Knowledge  Programming  for 
Logistics  Analysis  and  Technical  Evaluation)  data 
warehouse.  This  report  records  for  each  BuSer  (Bureau 
Serial  No.)  the  maintenance  action  taken,  the  WUC,  the 
maintenance  level,  the  man  hours  spent,  the  type  of 
'maintenance,  the  removed  part  No.,  as  well  as  the  installed 
part  No.  Out  of  the  10930  entries,  we  notice  that  the  WUCs 
with  prefix  “2747”  have  the  most  entries.  Therefore,  we 
chose  the  engine  subsystem  and  its  components  with  WUCs 
of  the  form  “2747*”  to  build  a  conceptual  TEAMS  [15] -[16] 
hierarchical  model  for  demonstration. 

The  TEAMS  model,  as  shown  in  Figure  12,  consists  of  31 
components,  corresponding  to  the  WUC  code  list  with 
prefix  “2747”  obtained  from  Navy’s  CMIS  (Configuration 
Management  Information  System)  system.  The  components 
with  the  longest  WUC  codes  are  at  the  bottom  level  of  the 
hierarchy.  The  mean-time-to-repair  for  each  component  is 
estimated  based  on  the  man  hours  spent,  as  entered  in  Figure 
13,  while  the  failure  frequencies  are  used  as  estimates  of  the 
failure  rates.  Figure  11  shows  a  pivot  chart  of  the  failure 
frequencies  as  recorded  in  the  report.  The  chart  shows  that 
the  Anti-icing  Valve  (WUC  2747D)  has  the  most  entries 
during  the  reporting  period.  For  this  study,  its  failure  rate 
was  set  to  be  0.0001,  while  all  the  other  components  were 
assigned  lower  failure  rates  proportional  to  their  failure 
frequencies  as  recorded  in  the  report.  In  general,  all 
components  were  assumed  to  be  single  points  of  failure 
(series  arrangement  in  the  reliability  block  diagram); 
however,  to  demonstrate  the  ability  to  handle  other  system 
configurations,  some  redundant  configurations  have  also 
been  assumed  in  the  model,  as  shown  in  the  small  block  in 
Figure  12. 
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Total  Number 


Pivot  Chart  of  WUC  Codes  2747 


WUCCode 


Figure  1 1 -  Pivot  Chart  of  Maintenance  Records  with  WUC  Codes  “2747*” 


Figure  12:  A  Conceptual  Hierarchical  TEAMS  Model  for 
Engine  Components  with  WUC  Codes  “2747*” 
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Figure  13:  Mean  Time  to  Repair  Estimated  from  the 
Maintenance  Records 
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Figure  14:  Redundancy  Analysis  Report  Showing  the  List  of 
Minimal  Cut  Sets  and  Their  Probabilities  at  10000  Hours 


After  the  conceptual  engine  component  model  was  built  in 
TEAMS,  various  types  of  analysis,  such  as  redundancy 
analysis  and  reliability  analysis,  were  performed. 
Redundancy  analysis  generates  a  list  of  minimal  cut  sets 
based  on  the  TEAMS  model  and  calculates  the  probability 
of  each  minimal  cut  set  at  the  user-defined  mission  time. 
Figure  14  shows  the  redundancy  report  generated  for  the 
engine  component  model.  There  are  24  minimal  cut  sets  in 
this  model,  out  of  which  17  are  singletons  (cut  sets  of  size  1) 
and  7  are  doubletons  (cut  sets  of  size  2).  The  list  of  minimal 
cut  sets,  the  list  of  components  and  their  failure  rates,  mean- 
time-to-repair,  etc.,  are  output  to  the  spares  analysis  module 
to  estimate,  for  a  spares  allocation  scenario,  how  the  system 
level  availability  evolves  over  time.  We  simulate  two 
scenarios,  one  for  different  spares  allocation  schemes  and 
the  other  for  different  residual  life  time  predictions  for  the 
main  fuel  pump.  In  the  simulation,  the  mean-time-to-repair 
when  the  spares  are  out-of-stock  is  set  to  be  10  times  the 
mean-time-to-repair  when  the  spares  are  in  stock.  The  repair 
time  is  assumed  to  be  exponentially  distributed. 
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Figure  15:  Plot  of  System  Unavailability  vs.  Time  for 
Different  Spares  Allocation  Schemes 
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Figure  16:  Plot  of  System  Unavailability  vs.  Time  for 
Different  Residual  Life  Estimates 


Figure  15  shows  the  system  unavailability  curves  for  three 
different  spare  allocation  schemes.  In  the  first  scheme,  we 
assume  that  the  number  of  spares  allocated  to  each 
component  is  proportional  to  its  failure  rate,  i.e.,  the 
component  that  fails  the  most  frequently  gets  the  most 
spares.  The  component  that  fails  the  most  in  the  system,  i.e., 
the  Antiicing  Valve,  gets  20  spares  when  the  stock  is  being 
replenished.  In  the  second  scheme,  every  component  gets 
only  half  the  number  of  spares  as  in  the  first  scenario. 
Finally,  in  the  third  scheme,  the  Antiicing  Valve  gets  3 
spares,  while  all  the  other  components  get  the  same  number 
of  spares  as  in  the  first  scenario. 

Figure  16  shows  the  system  unavailability  curves  when  the 
residual  lifetime  prediction  of  the  main  fuel  pump  varies.  In 
this  scenario,  we  apply  the  same  spares  allocation  scheme  as 
the  first  one  in  the  first  scenario.  In  the  first  case,  the 
residual  life  time  of  the  main  fuel  pump  is  derived  from  its 
nominal  failure  frequency,  which  is  approximately  65581 
hours.  In  the  other  three  cases,  we  assume  that  based  on 
prognosis,  the  main  pump’s  residual  life  time  has  been 
updated  to  6558,  1312,  656  hours,  respectively. 

As  we  can  see  from  Figure  15  and  Figure  16,  the  spares 
analysis  module  allows  a  user  to  evaluate  and  compare 
different  candidate  spares  allocation  schemes,  predict  the 
system  availability  trend  and  select  the  spares  allocation 
scheme  with  the  highest  system  availability  and  within  the 
budget  constraints.  Moreover,  the  system  availability 
prediction  is  updated  whenever  the  residual  life  time 
estimate  for  any  component  in  the  system  based  on 
prognosis  is  changed,  which  allows  one  to  predict  future 
spares  usage  and  make  proactive  asset  management 
decisions. 


10 


5.  Conclusion 


In  this  paper,  an  integrated  model-based  prognostic  process 
was  applied  to  predict  the  residual  life  time  of  a  generic 
centrifugal  pump  system.  In  this  process,  we  used  singular 
perturbation  methods  of  control  theory,  coupled  with 
dynamic  state  estimation  techniques.  An  IMM  filter  was 
employed  to  estimate  the  degradation  measure  and  the  time- 
averaged  mode  probabilities  are  used  to  predict  the  residual 
life  time.  The  residual  life  prediction  can  be  used  for 
demand  forecasting  and  hence  assist  in  spares  allocation  and 
inventory  management.  A  “smart  services”  solution 
supported  by  diagnostics  and  prognostics  was  also 
presented,  which  integrates  maintenance  and  inventory 
management  with  system  operating  conditions  and  health 
information  collected  and  inferred  by  a  diagnosis  and 
prognosis  server.  A  case  study  was  conducted  to 
demonstrate  the  process  of  updating  the  system  availability 
analysis  results  using  the  residual  life  estimates  from 
prognosis  and  assessing  the  efficacy  of  different  spares 
allocation,  schemes  in  maximizing  system  availability. 

There  are  several  future  extensions  to  this  research  work. 
These  include  handling  multiple  degradations  in  the 
prognosis  process,  application  of  the  prognosis  and  spares 
allocation  assessment  processes  to  real-world  systems,  and 
optimization  of  spares  allocation  using  evolutionary 
approaches. 
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